Clustering Based on Conditional Distributions in an Auxiliary Space
نویسندگان
چکیده
We study the problem of learning groups or categories that are local in the continuous primary space but homogeneous by the distributions of an associated auxiliary random variable over a discrete auxiliary space. Assuming that variation in the auxiliary space is meaningful, categories will emphasize similarly meaningful aspects of the primary space. From a data set consisting of pairs of primary and auxiliary items, the categories are learned by minimizing a Kullback-Leibler divergence-based distortion between (implicitly estimated) distributions of the auxiliary data, conditioned on the primary data. Still, the categories are defined in terms of the primary space. An online algorithm resembling the traditional Hebb-type competitive learning is introduced for learning the categories. Minimizing the distortion criterion turns out to be equivalent to maximizing the mutual information between the categories and the auxiliary data. In addition, connections to density estimation and to the distributional clustering paradigm are outlined. The method is demonstrated by clustering yeast gene expression data from DNA chips, with biological knowledge about the functional classes of the genes as the auxiliary data.
منابع مشابه
Clustering by Similarity in an Auxiliary Space
We present a clustering method for continuous data. It defines local clusters into the (primary) data space but derives its similarity measure from the posterior distributions of additional discrete data that occur as pairs with the primary data. As a case study, enterprises are clustered by deriving the similarity measure from bankruptcy sensitivity. In another case study, a content-based clus...
متن کاملIdentification of Plastic Wastes by Using Fuzzy Radial Basis Function Neural Networks Classifier with Conditional Fuzzy C-Means Clustering
The techniques to recycle and reuse plastics attract public attention. These public attraction and needs result in improving the recycling technique. However, the identification technique for black plastic wastes still have big problem that the spectrum extracted from near infrared radiation spectroscopy is not clear and is contaminated by noise. To overcome this problem, we apply Raman spectro...
متن کاملExploiting auxiliary distributions in stochastic uni cation-based grammars
This paper describes a method for estimating conditional probability distributions over the parses of \uniication-based" grammars which can utilize auxiliary distributions that are estimated by other means. We show how this can be used to incorporate information about lexical selectional preferences gathered from other sources into Stochastic \Uniication-based" Grammars (SUBGs). While we apply ...
متن کاملN-best based stochastic mapping on stereo HMM for noise robust speech recognition
In this paper we present an extension of our previously proposed feature space stereo-based stochastic mapping (SSM). As distinct from an auxiliary stereo Gaussian mixture model in the front-end in our previous work, a stereo HMM model in the back-end is used. The basic idea, as in feature space SSM, is to form a joint space of the clean and noisy features, but to train a Gaussian mixture HMM i...
متن کاملA Topography-Preserving Latent Variable Model with Learning Metrics
We introduce a new mapping model from a latent grid to the input space. The mapping preserves the topography but measures local distances in terms of auxiliary data that implicitly conveys information about the relevance or importance of local directions in the primary data space. Soft clusters corresponding to the map grid locations are defined into the primary data space, and a distortion mea...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Neural computation
دوره 14 1 شماره
صفحات -
تاریخ انتشار 2002